63 research outputs found

    LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery

    Full text link
    Large Language Models (LLMs) have transformed the landscape of artificial intelligence, while their enormous size presents significant challenges in terms of computational costs. We introduce LoRAShear, a novel efficient approach to structurally prune LLMs and recover knowledge. Given general LLMs, LoRAShear at first creates the dependency graphs over LoRA modules to discover minimally removal structures and analyze the knowledge distribution. It then proceeds progressive structured pruning on LoRA adaptors and enables inherent knowledge transfer to better preserve the information in the redundant structures. To recover the lost knowledge during pruning, LoRAShear meticulously studies and proposes a dynamic fine-tuning schemes with dynamic data adaptors to effectively narrow down the performance gap to the full models. Numerical results demonstrate that by only using one GPU within a couple of GPU days, LoRAShear effectively reduced footprint of LLMs by 20% with only 1.0% performance degradation and significantly outperforms state-of-the-arts. The source code will be available at https://github.com/microsoft/lorashear

    OTOV2: Automatic, Generic, User-Friendly

    Full text link
    The existing model compression methods via structured pruning typically require complicated multi-stage procedures. Each individual stage necessitates numerous engineering efforts and domain-knowledge from the end-users which prevent their wider applications onto broader scenarios. We propose the second generation of Only-Train-Once (OTOv2), which first automatically trains and compresses a general DNN only once from scratch to produce a more compact model with competitive performance without fine-tuning. OTOv2 is automatic and pluggable into various deep learning applications, and requires almost minimal engineering efforts from the users. Methodologically, OTOv2 proposes two major improvements: (i) Autonomy: automatically exploits the dependency of general DNNs, partitions the trainable variables into Zero-Invariant Groups (ZIGs), and constructs the compressed model; and (ii) Dual Half-Space Projected Gradient (DHSPG): a novel optimizer to more reliably solve structured-sparsity problems. Numerically, we demonstrate the generality and autonomy of OTOv2 on a variety of model architectures such as VGG, ResNet, CARN, ConvNeXt, DenseNet and StackedUnets, the majority of which cannot be handled by other methods without extensive handcrafting efforts. Together with benchmark datasets including CIFAR10/100, DIV2K, Fashion-MNIST, SVNH and ImageNet, its effectiveness is validated by performing competitively or even better than the state-of-the-arts. The source code is available at https://github.com/tianyic/only_train_once.Comment: Published on ICLR 2023. Remark here that a few images of dependency graphs can not be included in arXiv due to exceeding size limi

    Towards Automatic Neural Architecture Search within General Super-Networks

    Full text link
    Existing neural architecture search (NAS) methods typically rely on pre-specified super deep neural networks (super-networks) with handcrafted search spaces beforehand. Such requirements make it challenging to extend them onto general scenarios without significant human expertise and manual intervention. To overcome the limitations, we propose the third generation of Only-Train-Once (OTOv3). OTOv3 is perhaps the first automated system that trains general super-networks and produces high-performing sub-networks in the one shot manner without pretraining and fine-tuning. Technologically, OTOv3 delivers three noticeable contributions to minimize human efforts: (i) automatic search space construction for general super-networks; (ii) a Hierarchical Half-Space Projected Gradient (H2SPG) that leverages the dependency graph to ensure the network validity during optimization and reliably produces a solution with both high performance and hierarchical group sparsity; and (iii) automatic sub-network construction based on the super-network and the H2SPG solution. Numerically, we demonstrate the effectiveness of OTOv3 on a variety of super-networks, including RegNet, StackedUnets, SuperResNet, and DARTS, over benchmark datasets such as CIFAR10, Fashion-MNIST, ImageNet, STL-10, and SVNH. The sub-networks computed by OTOv3 achieve competitive even superior performance compared to the super-networks and other state-of-the-arts. The library will be released at https://github.com/tianyic/only_train_once

    Electrochemical reforming of ethanol with acetate Co-Production on nickel cobalt selenide nanoparticles

    Get PDF
    The energy efficiency of water electrolysis is limited by the sluggish reaction kinetics of the anodic oxygen evolution reaction (OER). To overcome this limitation, OER can be replaced by a less demanding oxidation reaction, which in the ideal scenario could be even used to generate additional valuable chemicals. Herein, we focus on the electrochemical reforming of ethanol in alkaline media to generate hydrogen at a Pt cathode and acetate as a co-product at a NiCoSe anode. We first detail the solution synthesis of a series of NiCoSe electrocatalysts. By adjusting the Ni/Co ratio, the electrocatalytic activity and selectivity for the production of acetate from ethanol are optimized. Best performances are obtained at low substitutions of Ni by Co in the cubic NiSe phase. Density function theory reveals that the Co substitution can effectively enhance the ethanol adsorption and decrease the energy barrier for its first step dehydrogenation during its conversion to acetate. However, we experimentally observe that too large amounts of Co decrease the ethanol-to-acetate Faradaic efficiency from values above 90% to just 50 %. At the optimized composition, the NiCoSe electrode delivers a stable chronoamperometry current density of up to 45 mA cm, corresponding to 1.2 A g, in a 1 M KOH + 1 M ethanol solution, with a high ethanol-to-acetate Faradaic efficiency of 82.2% at a relatively low potential, 1.50 V vs. RHE, and with an acetate production rate of 0.34 mmol cm h.This work was supported by the start-up funding at Chengdu University. It was also supported by the European Regional Development Funds and by the Spanish Ministerio de Economía y Competitividad through the project SEHTOP (ENE2016-77798-C4-3-R), MCIN/ AEI/10.13039/501100011033/ project, and NANOGEN (PID2020-116093RB-C43). X. Wang, C. Xing, X. Han, R. He, Z. Liang, and Y. Zhang are grateful for the scholarship from China Scholarship Council (CSC). X. Han and J. Arbiol acknowledge funding from Generalitat de Catalunya 2017 SGR 327. ICN2 acknowledges support from the Severo Ochoa Programme (MINECO, Grant no. SEV-2013-0295). IREC and ICN2 are funded by the CERCA Programme / Generalitat de Catalunya

    A possible 250-second X-ray quasi-periodicity in the fast blue optical transient AT2018cow

    Full text link
    The fast blue optical transients (FBOTs) are a new population of extragalactic transients of unclear physical origin. A variety of mechanisms have been proposed including failed supernova explosion, shock interaction with a dense medium, young magnetar, accretion onto a compact object, and stellar tidal disruption event, but none is conclusive. Here we report the discovery of a possible X-ray quasi-periodicity signal with a period of \sim250 second (at a significance level of 99.76%) in the brightest FBOT AT2018cow through the analysis of XMM-Newton/PN data. The signal is independently detected at the same frequency in the average power density spectrum from data taken from the Swift telescope, with observations covering from 6 to 37 days after the optical discovery, though the significance level is lower (94.26%). This suggests that the QPO frequency may be stable over at least 1.1×\times 104^{4} cycles. Assuming the \sim250 second QPO to be a scaled-down analogue of that typically seen in stellar mass black holes, a black hole mass of 103105\sim10^{3}-10^{5} solar masses could be inferred. The overall X-ray luminosity evolution could be modeled with the stellar tidal disruption by a black hole of 104\sim10^4 solar masses, providing a viable mechanism to produce AT2018cow. Our findings suggest that other bright FBOTs may also harbor intermediate-mass black holes.Comment: 18 pages, 10 figures. Accepted for publication in Research in Astronomy and Astrophysic
    corecore